Search results for "Hierarchical clustering"
showing 10 items of 56 documents
Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters
2017
Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Se…
FragClust and TestClust, two informatics tools for chemical structure hierarchical clustering analysis applied to lipidomics. The example of Alzheime…
2016
Lipidomic analysis is able to measure simultaneously thousands of compounds belonging to a few lipid classes. In each lipid class, compounds differ only by the acyl radical, ranging between C10:0 (capric acid) and C24:0 (lignoceric acid). Although some metabolites have a peculiar pathological role, more often compounds belonging to a single lipid class exert the same biological effect. Here, we present a lipidomics workflow that extracts the tandem mass spectrometry data from individual files and uses them to group compounds into structurally homogeneous clusters by chemical structure hierarchical clustering analysis (CHCA). The case-to-control peak area ratios of the metabolites are then a…
Multivariate statistical analysis of a large odorants database aimed at revealing similarities and links between odorants and odors
2017
International audience; The perception of odor is an important component of smell; the first step of odor detection, and the discrimination of structurally diverse odorants depends on their interactions with olfactory receptors (ORs). Indeed, the perception of an odor's quality results from a combinatorial coding, in which the deciphering remains a major challenge. Several studies have successfully established links between odors and odorants by categorizing and classifying data. Hence, the categorization of odors appears to be a promising way to manage odors. In the proposed study, we performed a computational analysis using odor descriptions of the odorants present in Flavor-Base 9th Edit…
Comparison of conventional descriptive analysis and a citation frequency-based descriptive method for odor profiling: An application to Burgundy Pino…
2010
International audience; The limitations of intensity scoring when describing the odor characteristics of a complex product have been documented in the literature. In the present work, the odor properties of 12 Burgundy Pinot noir wines were described by two independent panels performing, respectively, an intensity-based (conventional descriptive analysis) and a citation frequency-based method. Methods were compared according to three criteria: similarity of the sensory maps, control of panel performance and practical aspects. Intensity scoring and citation frequency data were analyzed, respectively, by Principal Components Analysis (PCA) and Correspondence Analysis (CA) followed by Hierarch…
A hierarchical cluster analysis to determine whether injured runners exhibit similar kinematic gait patterns
2020
Previous studies have suggested that runners can be subgrouped based on homogeneous gait patterns, however, no previous study has assessed the presence of such subgroups in a population of individuals across a wide variety of injuries. Therefore, the purpose of this study was to assess whether distinct subgroups with homogeneous running patterns can be identified among a large group of injured and healthy runners and whether identified subgroups are associated with specific injury location. Three‐dimensional kinematic data from 291 injured and healthy runners, representing both sexes and a wide range of ages (10‐66 years) was clustered using hierarchical cluster analysis. Cluster analysis r…
Impact of the COVID-19 pandemic on music: a method for clustering sentiments
2021
The outbreak of coronavirus disease 2019 (COVID-19) was highly stressful for people. In general, fear and anxiety about a disease can be overwhelming and cause strong emotions in adults and children. One way to cope with this stress consists in listening to music. Aim of this work is to understand if the music heard during the lock-down reflects the emotions generated by the pandemic on each of us. So, the primary goal of this work is to build two indices for measuring the anger and joy levels of the top streamed songs by Italian Spotify users (during the SARS-CoV-2 pandemic), and study their evolution over time. A Hierarchical Cluster Analysis has been applied in order to identify groups o…
An original way to evaluate daily rainfall variability simulated by a regional climate model: the case of South African austral summer rainfall
2014
We discuss the value of a clustering approach as a tool for evaluating daily rainfall output from climate models. Ascendant hierarchical clustering is used to evaluate how well South African recurrent daily rainfall patterns are simulated during the austral summer (December to February 1970–1971 to 1998–1999). A set of 35-km regional climate simulations, run with the WRF model and driven by the ERA40 reanalysis, is chosen as a case study. Six recurrent patterns are identified and compared to the observed clusters obtained by applying the same methodology to 5352 daily rain gauge records. Two of the WRF clusters describe either a persistent and widespread dryness (65% of the days) or pattern…
An Analysis of Regional and Intra-annual Precipitation Variability over Iran using Multivariate Statistical Methods
1998
The temporal and spatial precipitation regime of Iran was analysed using multivariate analyses of monthly mean precipitation records for 71 stations. A Principal Component Analysis was applied to the correlation matrix in order to describe the intra-annual variations of precipitation. The Principal Component scores were mapped to visualize the spatial structure of the three derived precipitation regimes. By applying an agglomerative clustering (WARD) of the three Principal Component scores, five homogeneous spatial clusters, representing five precipitation regions, were developed. The intra-annual types of precipitation distribution, shown by the five clusters, are described and discussed.
Cluster-based active learning for compact image classification
2010
In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer…
Fast dendrogram-based OTU clustering using sequence embedding
2014
Biodiversity assessment is an important step in a metagenomic processing pipeline. The biodiversity of a microbial metagenome is often estimated by grouping its 16S rRNA reads into operational taxonomic units or OTUs. These metagenomic datasets are typically large and hence require effective yet accurate computational methods for processing.In this paper, we introduce a new hierarchical clustering method called CRiSPy-Embed which aims to produce high-quality clustering results at a low computational cost. We tackle two computational issues of the current OTU hierarchical clustering approach: (1) the compute-intensive sequence alignment operation for building the distance matrix and (2) the …